Association pattern mining of intron retention events in human based on hybrid learning machine.

نویسندگان

  • Hae-Jin Hu
  • Sung-Ho Goh
  • Yeon-Su Lee
چکیده

Alternative splicing is a main component of protein diversity, and aberrant splicing is known to be one of the main causes of genetic disorders such as cancer. Many statistical and computational approaches have identified several major factors that determine the splicing event, such as exon/intron length, splice site strength, and density of splicing enhancers or silencers. These factors may be correlated with one another and thus result in a specific type of splicing, but there has not been a systematic approach to extracting comprehensible association patterns. Here, we attempted to understand the decision making process of the learning machine on intron retention event. We adopted a hybrid learning machine approach using a random forest and association rule mining algorithm to determine the governing factors of intron retention events and their combined effect on decision-making processes. By quantifying all candidate features into five category values, we enhanced the understandability of generated rules. The interesting features found by the random forest algorithm are that only the adenine- and thymine-based triplets such as ATA, TTA, and ATT, but not the known intronic splicing enhancer GGG triplet is shown the significant features. The rules generated by the association rule mining algorithm also show that constitutive introns are generally characterized by high adenine- and thymine-based triplet frequency (level 3 and above), 3' and 5' splice site scores, exonic splicing silencer scores, and intron length, whereas retained introns are characterized by low-level counterpart scores.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hybrid model based on machine learning and genetic algorithm for detecting fraud in financial statements

Financial statement fraud has increasingly become a serious problem for business, government, and investors. In fact, this threatens the reliability of capital markets, corporate heads, and even the audit profession. Auditors in particular face their apparent inability to detect large-scale fraud, and there are various ways to identify this problem. In order to identify this problem, the majori...

متن کامل

A Data Mining approach for forecasting failure root causes: A case study in an Automated Teller Machine (ATM) manufacturing company

Based on the findings of Massachusetts Institute of Technology, organizations’ data double every five years. However, the rate of using data is 0.3. Nowadays, data mining tools have greatly facilitated the process of knowledge extraction from a welter of data. This paper presents a hybrid model using data gathered from an ATM manufacturing company. The steps of the research are based on CRISP-D...

متن کامل

Sports Result Prediction Based on Machine Learning and Computational Intelligence Approaches: A Survey

In the current world, sports produce considerable statistical information about each player, team, games, and seasons. Traditional sports science believed science to be owned by experts, coaches, team managers, and analyzers. However, sports organizations have recently realized the abundant science available in their data and sought to take advantage of that science through the use of data mini...

متن کامل

IT Infrastructure Downtime Preemption using Hybrid Machine Learning and NLP

IT Infrastructure Management and server downtime have been an area of exploration by researchers and industry experts, for over a decade. Despite the research on web server downtime, system failure and fault prediction, etc., there is a void in the field of IT Infrastructure Downtime Management. Downtime in an IT Infrastructure can cause enormous financial, reputational and relationship losses ...

متن کامل

Forecasting Stock Price Movements Based on Opinion Mining and Sentiment Analysis: An Application of Support Vector Machine and Twitter Data

Today, social networks are fast and dynamic communication intermediaries that are a vital business tool. This study aims at examining the views of those involved with Facebook stocks so that we can summarize their views to predict the general behavior of this stock and collectively consider possible Facebook stock price movements, and create a more accurate pattern compared to previous patterns...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genes & genetic systems

دوره 85 6  شماره 

صفحات  -

تاریخ انتشار 2010